Amino Acid Substitution Scores

ثبت نشده
چکیده

GNPKVKAH Here we discuss standard ways of assigning a score to each amino acid pair, i.e., to each possible column of a gap-free pairwise protein alignment. Examples of such scoring matrices include the PAM30, PAM70, BLOSUM80, BLOSUM62 and BLOSUM45 matrices that are available on NCBI’s blastp server. Such scores are appropriate for comparing two sequences about which we have no other information (as opposed to position specific scores tailored for a particular protein family). Thus, we seek a 20-by-20 array of numbers for protein sequence comparisons. Observation 1. There exist independent and reliable means of deciding if a particular scoring matrix gives good results. Given an alignment that is optimal with respect to certain scores, does each non-gap column contain letters that are derived from the same ancestral letter by replacement operations? We can use 3-dimensional structural information to decide the “correct answer.” With database searches, we care more about the scores than about the actual alignment, so, basically, the question is: how many of the known homologs of the query sequence score higher than the highest-scoring unrelated sequence? A number of protein families are extremely well studied and can be used to answer such questions. In contrast, for non-coding DNA sequences, it is difficult to determine the “correct alignment.” One approach is to use a knowledge of protein biochemistry to predict which amino-acid pairs are most likely to arise by replacement operations, and thereby obtain scores. However, it works better to simply look at a “training set” of correct alignments and observe frequencies of each kind of column. (Knowledge of biochemistry is of course important for determining the training set.) To help motivate these ideas, ask yourself, “What should be the relationship between the score for aligning two As as opposed to aligning two Ws?” The point is that A occurs much more frequently than W in a typical protein sequence. Most people will say that a W-over-W column should score higher than an A-over-A column. Intuitively, the reason is that W-over-W provides stronger evidence that the alignment is correct since it will occur in a chance alignment of unrelated sequences much less frequently than A-over-A, despite the fact that W-over-W appears somewhat less frequently in correct alignments than does A-over-A. Observation 2. We use scores that assess the likelihood that the alignment’s columns are drawn from the population of correct columns, as compared to the likelihood they are generated by chance. Accordingly, we use two statistical models of an alignment: one reflects biologically correct alignments and the other reflects chance alignments of unrelated sequences. (We’ll think of the columns as independent identically distributed (i.i.d.). Markov chains are an alternative, though more complicated.) It is the ratio of the two probabilities that interests us most. For the model of aligning unrelated sequences, we assign probabilities q(x) to amino acids x, reflecting the frequency with which they appear in protein sequences. Thus, the probability that a random alignment of unrelated sequences happens to align the sequences x1x2 . . . xn and y1y2 . . . yn is ∏ n i=1 (q(xi)q(yi)). Assuming that the 20 q-values

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical Determination of Amino Acid Substitution Groups based on Qualitative Physicochemical Properties

This paper introduces a novel method for theoretical determination of amino acid substitution groups. The method here involves making a binary matrix based on 48 qualitative physicochemical properties and calculating a substitution matrix based on this using dot products. Isolated groups with high scores are determined to be valid substitution groups and conserved groups are derived from these ...

متن کامل

Amino acid substitution matrices for protein conformation identification

Methods for alignment of protein sequences typically measure similarity by using substitution matrix with scores for all possible exchanges of one amino acid with another. Although widely used, the matrices derived from homologous sequence segments, such as Dayhoff’s PAM matrices and Henikoff’s BLOSUM matrices, are not specific for protein conformation identification. Using a different approach...

متن کامل

Substitution of soybean with canola meal in laying hens diets formulated based on total and digestible amino acids on performance and blood parameters

An experiment was conducted to study the effects of substitution soybean meal (SBM) with canola meal (CM) and formulated diets based on total and digestible amino acid on performance, egg quality, organs weight and blood parameters of laying hens from 73 to 83 weeks of age. A total of 128 laying hens were distributed by completely randomized design in a 2×2 factorial arrangement with 2 protein ...

متن کامل

Amino acid substitution matrices from protein blocks ( amino add sequence / alignment algorithms / data base srching )

Methods for alignment of protein sequences typically measure similait by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more tha...

متن کامل

Evaluation of the Effect of Less Negatively Charged Amino Acid Substitution in Synthetic Tetramer Peptide S3 Derived from Horseshoe Crab Ambocyte on its Antibacterial Properties

Introduction: The study of the effects of synthetic peptides with antibacterial properties can provide more effective antibiotics. This study designed, expressed, and investigated the Sushi 3 tetramer peptide. Subsequently, it was compared in terms of changing antibacterial properties with another Sushi3 tetramer peptide the aspartic acid and proline amino acids of which were replaced with glyc...

متن کامل

Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects.

Most Mendelian diseases studied to date arise from mutations that lead to a single amino acid change in an encoded protein. An increasing number of complex diseases have also been associated with amino acid-changing single-nucleotide polymorphisms (coding SNPs, cSNPs), suggesting potential similarities between Mendelian and complex diseases at the molecular level. Here, we use two different evo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008